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Road vehicles are today’s primary form of transportation; the safety of 
children passengers must take precedence. Numerous reports of toddler death 
in road vehicles, include heatstroke and accidents caused by negligent parents. 
In this research, we report a system developed to monitor and detect a toddler's 
presence in a vehicle and to classify the toddler's seatbelt status. The objective 


of the toddler monitoring system is to monitor the child's conditions to ensure 


the toddler's safety. The device senses the toddler's seatbelt status and warns 
Keywords: the driver if the child is left in the car after the vehicle is powered off. The 
vision-based monitoring system employs deep learning algorithms to 
recognize infants and seatbelts, in the interior vehicle environment. Due to its 
Human detection superior performance, the Nvidia Jetson Nano was selected as the 
Neural network computational unit. Deep learning algorithms such as faster region-based 
Tensorflow convolutional neural network (R-CNN), single shot detector (SSD)- 
Vision-based system MobileNet, and single shot detector (SSD)-Inception was utilized and 
compared for detection and classification. From the results, the object 
detection algorithms using Jetson Nano achieved 80 FPS, with up to 82.98% 
accuracy, making it feasible for online and real-time in-vehicle monitoring 
with low power requirements. 
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1. INTRODUCTION 

In this age, automobiles were the primary mode of human mobility. People are worried on safety 
aspects in automotive specs, such as the car's construction, car seat and airbag, in this instance. However, the 
most crucial issue in safety is the fact that individuals or drivers tend to disregard the behaviours of other 
passengers. Human activity recognition (HAR) is a field of study that seeks to identify a person's actions based 
on sensor or camera observation [1 ]-[4]. 

Due to the development of autonomous vehicles recently, a lot of research on the surrounding 
monitoring of vehicles has been done [5], [6]. However, the tracking of a passenger is still less to be concerned 
about. An accident is sometimes unpredictable and unpreventable, and not only because of the carelessness or 
drowsiness of the driver. What people must do in an accident is know how to survive in the accident. According 
to the child accident prevention trust organization, twelve children under ten are killed or injured as passengers 
in cars every day. An inside car safety monitoring system for toddlers is a system that recognizes the seatbelt 
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condition of a toddler to alert parents. The focus of this research is to develop a relatively new system using 
artificial intelligence which can recognize the seatbelt condition of toddlers in the backseat. 


2. RELATED WORK 

As a means of providing a concise introduction to object detection, this section provides an overview 
of relevant research on the topic of seatbelt detection difficulties. Object detection [7] is widely regarded as 
one of the most crucial challenges facing the area of computer vision. Every day, there is a further expansion 
of the scope of the object detection difficulties. In order to solve these issues, the research and development 
groups frequently make use of cutting-edge methods such as machine learning [8], [9]. 

As a subfield of artificial intelligence [10], [11], computer vision [12], [13] is described as the process 
of teaching computers to comprehend the visual environment. It is able to identify all of the items or persons 
in a picture by utilizing a mix of information and can do so with a level of success that is reasonable [14]. 
Through the utilization of digital image capture via cameras and learning models, the computer is able to 
effectively detect and discriminate between items. Computer vision has been able to emulate humans in several 
tasks linked to recognizing and labelling things [15], thanks to advancements in deep learning [16], [17] and 
neural networks. This was previously impossible. Pattern recognition is the name of the game here, and this is 
carried out by teaching a computer how to recognize different kinds of visual input. The autonomous vehicle, 
often known as a self-driving automobile, is an example of one of the more well-known applications of 
computer vision [18]. Computer vision is sometimes referred to as "perception" in the area of autonomous cars 
since cameras are one of the primary instruments that a vehicle uses to perceive its environment. 

The first simulation of perceptron a was carried out by Frank-Rosenblatt on an IBM 704 computer. 
This ultimately resulted in the building of an electronic machine [19]. An area of artificial intelligence known 
as machine learning enables computers to learn from previous data or experiences without being explicitly 
programmed [20]. Developing computer systems that have access to data and can learn from the data they have 
acquired themselves is the primary emphasis of machine learning. Identifying a pattern in a big dataset is one 
of many applications for machine learning, which may be used in a variety of industries. The generation of 
example data is the initial stage of machine learning, which involves the collection and preparation of data. 
After then, the data that has been prepared will be input into the machine in order to train it. Following the 
completion of the training procedure, a model will be implemented. Creating additional example data could 
make the model better in the long run. The process of machine learning is illustrated in Figure 1. 


—— Generate example 


™ data 


Deploy the 
model \ 


Train a model 


Figure 1. Flow of machine learning 


Neural Network is popular for the human detection system. Yan ef al. used the region based 
convolutional neural network (R-CNN) to recognize the driver’s behaviour based on convolutionan neural 
network (CNN) whereas Nikouei et al. and Bao et al. using lightweight convolutional neural network 
[21]—[23] for realtime human detection and gender edge estimation. Murthy et al., Yan et al. and Jose et al. 
[24]-[26], used convolution neural network for human pose estimation, drowsiness detection system and face 
recognition The hardware used are from the computer using the NVIDIA graphics processing units (GPU) 
Rasberry Pie and Jetson Nano for the image processig and classification. Nevertheless, most of the past research 
did emphasized on the detection process at the front or driver seat. There is less concern about the passenger, 
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especially the toddler. Furthermore, the safety of the passenger is not stated in the journals. From these gaps, a 
toddler monitoring system in the vehicle by using artificial intelligence will be developed. 


3. METHODHOLOGY 

In this section, SSD-Inception and SSD-Mobilenet are used as the networks for the toddler monitoring 
system. Both networks are trained to detect the toddler's situation in the backseat. More specifically, the main 
objective is to split up three cases: (1) detect the presence of a toddler, (2) classify the safety condition by 
detecting the seatbelt, and (3) compare the performance of the networks. The flow for the monitoring system 
is shown as in Figure 2. 


Identify the project title and si 
establish problem statement, Testing the functionality 
objectives and scope of model trained 


Studies on past related 
research and journal 
Adjustment 


or modification 
requirement? 


Determine the 
requirement of the 
monitoring system 


Data collection 
Data validation 


Image preprocessing 


Model Training 


Compare the performance 
of different neural 
network 


Build up the confusion 
matrix of seatbelt class 


Analysis and discussion on 
result obtained 
Conclusion 


Build up the confusion 
matrix of toddler class 


Figure 2. Flowchart of the toddler monitoring system 


3.1. Hardware selection 

Due to the rapid growth of technologies nowadays, many types of single-board computers such as 
Raspberry Pi, Intel and Nvidia can be found on the market. The Nvidia Jetson Nano was chosen because of the 
application programming interface (API) model created by Nvidia, which is compute unified device 
architecture (CUDA). CUDA is a parallel computing platform and API model. It enables developers to use the 
CUDA-enabled graphic processing unit (GPU) for general-purpose processing, allowing the term general- 
purpose computing on graphics processing units (GPU) to its full extent. Developers can significantly 
accelerate computing applications by leveraging the power of GPUs and the presence of CUDA. 


3.2. Algorithm development 

The flow of the design of the development for the monitoring system is shown as in Figure 3. Firstly, 
the image of the toddler in the backseat will be collected and the image will be preprocessed before labelling 
to ensure all the images are the same in type and size. After the annotation is done, the images will be fed into 
the system for training. The trained system will test for functionality. 
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Data Pre- Label and System , 
ES- ce =» ily 


Figure 3. Flowchart of toddler monitoring system design 


3.3. Data collection and annotation 

The data is collected in video form (.mp4) and all the videos are extracted into images for training 
purposes. The location to collect the data is set at the back of the seat. Examples of the collecte images are 
shown as in Figure 4. 


J 


bo 


Figure 4. Sample images collected 


Data annotation is the process of adding metadata to a dataset in preparation for training a machine 
learning model. This process is to generate an annotation file that contains the information about the box 
location of the region and the name of the annotation for all the images. The function of the annotation file is 
to help machines learn certain patterns and correlate the results. Labellmg is used as a graphical image 
annotation tool as shown as in Figrure 5. It can output an annotation file in a Pascal VOC XML file. The 
annotation makes two classes called "toddler" and "seatbelt" that can find out if a "toddler" or "seatbelt" is 
present. 


Peet teed eel a ie a 
jepelo:ie:xie 


Figure 5. Labelimg annotation process 


3.4. System training 

The network was trained on Google Colab. The provided GPU was used to train the model. The model 
is initialized with the original SSD-Mobilenet and SSD-Inception. Only the output layers were pre-trained. The 
sample image of the output from the trained network is shown in Table 1. The situation is classified by checking 
the number of bounding boxes from different classes on the image as shown in Table 2. 
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Table 1. Sample of image result from trained network 
SSD-Inception 


Table 2. Sample of image result for classified images 
Backseat condition/output of system ; ; Image Result 
Toddler is absent. 


One or more than one toddler is not using seatbelt. 
(Speaker starts to beep to alert driver) 


All the detected toddlers are using the seatbelts. 
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4. RESULT 


The results are compared by using the confusion matrix method. Table 3 shows the confusion matrix 
table for performance comparison. The parameters such as the Accuracy, Precison and Recall are calculated 
based on the (1)-(3). Tables 3(a) and 3(b) are the result of the performance that is calculated by using the data 


from the confusion matrix table and the summary in graph format as shown as in Figure 6. 


TP+TN 
Accuracy = ————_ (1) 
TP+TN+FP+FN 
naee TP 
Precision = (2) 
TP+FP 
TP 
Recall = —— (3) 
TP+FN 


Table 3. Confusion Matrix of (a) SSD-Mobilenet and (b) SSD-Inception 
SSD-MOBILENET 


TODDLER 


SEATBELT 
PREDICTED TOTAL PREDICTED TOTAL 
NO YES NO YES 
ACTUAL NO 3 3 6 ACTUAL NO 50 1 51 
YES 41 141 182 YES 31 106 137 
TOTAL 44 144 188 TOTAL 81 106 188 
CORRECTLY DETECTED 75% CORRECTLY DETECTED 56.38% 
(a) 
SSD-INCEPTION 
TODDLER SEATBELT 
PREDICTED TOTAL PREDICTED TOTAL 
NO YES NO YES 
ACTUAL NO 5 1 6 ACTUAL NO 50 1 51 
YES 41 141 182 YES 35 102 137 
TOTAL 46 142 188 TOTAL 85 103 188 
CORRECTLY DETECTED 75% CORRECTLY DETECTED 54.26% 


(b) 


From Table 4, it was found that SSD-Inception gives better performance with 77.70% of accuracy, 
97.92% of precision and 77.47% of recall when detecting the toddler class, while SSD-Mobilenet performs 
better in the class of seatbelt with 82.98% of accuracy, 99.07% of precision and 77.37% of recall. Though there 
is a performance difference between both neural networks, it is just a slight difference. SSD-Mobilenet has a 
higher frame per second (FPS) which is 8.5 FPS, than SSD-Inception, which has 5.7 FPS. It means that SSD- 
Mobilenet can respond faster than SSD-Inception as the performance of both neural networks is only slightly 
different in accuracy. The performance comparison between the networks is shown in Table 5 and later by 
Figure 6 (a) & (b). The tensorboard function is used to get the mean average precision (mAP) with 0.5 
intersection over union (IoU) of both neural networks as shown in Table 5. 


Table 4. Performance of neural network 


TODDLER SEATBELT 
TYPE OF NEURAL FP ACCURAC  PRECISIO RECAL FP ACCURAC- PRECISIO. RECAL 
NETWORK S Y N L S Y N L 
SSD-MOBILENET 8.5 76.60% 97.92% T1AT% 8.5 82.98% 99.07% 71.37% 
SSD-INCEPTION 5.7 71.10% 99.30% TIAT% 5.7 80.85% 99.03% 74.45% 


Table 5. Performance of neural network 


Average Precision Classes Mean Average Precision 
Seatbelt Toddler 
SSD-Mobilenet 0.94129 0.980041 0.973779 
SSD-Inception 0.88132 0.927863 0.904591 
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(b) 
Figure 6. The performance of neural network (a) Toddler class and (b) Seatbelt class 


5. CONCLUSION 

This paper presents comprehensive work on the design and development of a toddler monitoring 
system to determine the seatbelt condition of the toddler in the backseat and inform the driver about the safety 
condition of the toddler. The toddler monitoring system is designed to be vision-based and the neural network 
method is used. The Jetson Nano is used as a microcontroller for the system due to its powerful performance 
to run the neural network for object detection. The SSD-type neural network is the best choice for Jetson Nano 
because it needs less processing power from the mobile controller. In the SSD-type neural network, SSD- 
Inception and SSD-Mobilenet are chosen and compared. The comparison of the performances of different 
neural networks has been carried out, and the result is shown in the previous chapter. It can be concluded that 
SSD-Mobilenet has better performance in speed, which is FPS when processing a video image, while the 
accuracy of both neural networks has no large difference. As the work progressed at this stage, several future 
expansion and development ideas were noted. For future improvements, the system's accuracy and sensitivity 
need to be improved by using more different data with a different model to the vehicle, toddlers with different 
ages, skin color, and so on to increase the database. Furthermore, the system can interact with the vehicle to 
get a more accurate output. To achieve this, cooperation with vehicle companies needs to be conducted. 
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